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In this paper, we propose a class of high breakdown point es- 
timators for the linear regression model when the response variable 
contains censored observations. These estimators are robust against 
high-leverage outliers and they generalize the LMS (least median of 
squares), S, MM and r-estimators for linear regression. An important 
contribution of this paper is that we can define consistent estimators 
using a bounded loss function (or equivalently, a redescending score 
function). Since the calculation of these estimators can be computa- 
tionally costly, we propose an efficient algorithm to compute them. 
We illustrate their use on an example and present simulation studies 
that show that these estimators also have good finite sample proper- 
ties. 

1. Introduction. Consider the linear regression model 

(1.1) Ui = f3' Xi + Ui, i = l,...,n, 

where Ui are i.i.d. errors, and the covariates x.; £ M p are independent from the 
errors. When there is an intercept the first component of Xj is set to 1. In this 
paper, we study the problem of robust estimation of (3 when the response 
variable is censored. Miller [12] studied least squares estimators (LS) for 
censored responses. He proposed to modify the classical LS estimator 

n 

(1.2) (3 Tl = argmin V(y; - /3'x^) 2 = argminE^ [m 2 ], 

/3eRP i=1 /3GRP 
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replacing the empirical distribution of the residuals F n p with the correspond- 
ing Kaplan-Meier (KM) estimator F*n (Kaplan and Meier [9]). Unfortu- 
nately, the resulting estimator is not consistent in general and the iterative 
algorithm to compute it may have several or no solutions. 

Buckley and James [2] studied a different extension of LS to censored 
response variables by modifying the LS scores equations 

n 

(1-3) £(y i -Ax i )x i = > 

i=l 

using a conditional distribution approach. This proposal replaces censored 
residuals by their estimated conditional expectation given that the response 
is larger than the recorded (censored) value. The conditional expectation is 
estimated using F*~ . James and Smith [7] and Lai and Ying [10] showed 

that this estimator is consistent. 

A different approach is proposed by Stute [19, 20] and Sellero, Manteiga 
and Van Keilegom [18]. They propose to apply Kaplan-Meyer to the re- 
sponses instead to the residuals. The shortcomings of this approach is that 
they require stronger assumptions on the censoring variable and that the 
proposed estimates are not regression equivariant. 

In recent years there has been some interest in extending robust regres- 
sion estimators to the case of censored response variables. Ritov [13] stud- 
ied a generalization of Bukley and James' proposal for robust estimators. 
He considered monotone nondecreasing score functions tp (that correspond 
to unbounded loss functions p) and showed that under certain regularity 
conditions there exists a sequence of -y/n-consistent solutions to the estimat- 
ing equations. This sequence is also asymptotically normal. Unfortunately, 
since these estimators are based on an unbounded loss function p they are 
not robust against high-leverage outliers. More recently, Lai and Ying [11] 
extended the conditional expectation approach of Bukley and James to M- 
regression estimators for censored and truncated data. Their proposal also 
requires a monotone score function. 

If we allow for a redescending score function ip (equivalently, a bounded 
loss function p), then the estimating equations may have several solutions 
with different robustness properties. Moreover, if we define a robust estima- 
tor as the solution to a minimization problem similar to (1.2) but replacing 
the squared residuals with p(u) for a bounded loss function p, then this 
estimator may not be consistent (Lai and Ying [10, 11]). Hence, unlike in 
the uncensored regression model, we do not have a way to identify which 
solutions of the redescending score equations are not affected by the outliers. 

In this paper, we extend the approach of Bukley and James and Ritov to 
M-estimators with bounded loss functions p. We achieve this by proposing 
an estimator that is the solution to a minimization problem that has a con- 
sistent and robust solution. In particular, we obtain extensions of the LMS 
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(see Rousseeuw [14]), S (see Rousseeuw and Yohai [16]), MM-estimators (see 
Yohai [22]) and r-estimators (see Zamar and Yohai [23]). We show that these 
estimators are Fisher and -^/n-consistent, asymptotically normal, and that 
they have high breakdown point. 

It is important to realize that when there are censored observations the 
breakdown point of an estimator maybe much lower than in the uncensored 
case. For example, in the location model the worst contamination occurs 
when all the censored observations are between the outliers and the "good" 
noncensored points. Suppose that we have a fraction e of outliers going to 
+oo and a proportion A of censored observations. Since the KM estimator 
distributes the mass of the censored observations among the noncensored 
points to their right (Efron [4]), in this case the mass given to the outliers 
by the KM estimators will be 7 = A + e. Consequently, the sample median 
will not break if 7 < 1/2, or equivalently, if e < 1/2 — A = 77. It follows that 
the breakdown point of the median is equal to r/, which is less than 1/2 when 
there are censored observations. 

The rest of this paper is organized as follows. Section 2 contains our 
main definitions. The robustness properties of our proposal are discussed 
in Section 3 and their asymptotic properties in Section 4. In Section 5, we 
present an algorithm to compute these estimators. An example with real-life 
data is given in Section 6 and the results of a Monte Carlo experiment are 
discussed in Section 7. The proofs of the theorems are given in the Appendix 
while those for the lemmas can be found in a technical report by Salibian- 
Barrera and Yohai [17]. 

2. Robust estimators. Consider the linear regression model (1.1). We as- 
sume that the sample may be right-censored, that is, there are unobservable 
random variables ci,...,c n independent from the errors Uj's such that we 
observe y* = min(y,, q) for i = 1, . . . , n. In other words, the observed data is 
Zj = (y*, xj, Si)', i = 1, . . . ,n, where <5j = I{yi < q}, and I{A} is the indicator 
function of the event A. 

When the scale of the residuals is known, regression M-estimators for 
uncensored observations are defined by 

1 n 

(2.1) /3 n = argmin- V p{n(P)) =argmin£ F n/3 [p(u)}, 

where F n p is the empirical distribution of the residuals rj(/3) = y^ — /3'xj, 
and p : R — > M + is a function satisfying: 

PI. p(0) = and p is continuous at 0. 
P2. p(-u) = p(u) for u > 0. 
P3. p is monotone nondecreasing on u > 0. 
P4. sup u p(u) = a < +00. 
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(See Huber [6].) If ip( u ) = dp(u)/du then the estimator f3 n also satisfies the 
following vector equation: 

1 n 

(2-2) -^(n(/3))x i = E Hnl3 [i;( U )x]=0, 

1=1 

where H n p is the empirical distribution of the vectors (rj(/3),x^)' G 
i = 1, . . . , n. 

Since not all the residuals rj(/3) are observed in the presence of censoring, 
we can define the censored residuals by r*((3) = y* — /3'x.i. Note that r*{j3) = 
min(rj(/3), c.j — /3'xj), and therefore we can think of the r^((3) as censored 
observations of rj(/3) with censoring variables Cj — /3'xj, i = 1, . . . , n. Then, 
in the case of a censored response variable, one way to generalize (2.1) is to 
replace it by 

2 n \ n 

(2.3) n = argmin- Y] E[p(n(^))\zi] = argmin- Y]E F [p(u)\wi(/3)], 
n . =1 /3gMP n . =l 

where Fp is the distribution of the residuals r(/3), w^(/3) = (r*((3),8i) and 

p(r?(/3)), if^ = l, 

p(u)d^H/[l -i^(r?09))], if <5, =0. 



£ F >(«)| Wl (/3)) 



Intuitively, to obtain (2.3) from (2.1), for each censored observation we 
replace the term p(ri((3)) in (2.1) by the conditional expectation of p(u) 
given that the (actual but unobserved) residual is larger than or equal to 
the observed censored residual r*{f3). 

The score equations in (2.2) can also be similarly modified to obtain 

1 n 

(2.4) -^^[^(«)|w i (/3)]x i = 0. 

Tl . 
1=1 

Since the distribution of the residuals Fa in (2.3) and (2.4) is unknown, we 
can estimate it with the Kaplan-Meier estimator F* R based on r*(/9). 
To guarantee consistency of the estimator defined by 

1 - 

(2.5) (3 n = argmin - ]T E F . [p(u)\ w, (/3)] 

we need that F* a be consistent to F@ for all /3. Let F and D be the dis- 
tribution functions of the errors U{ and censoring variables Cj, i = 1, . . . , n, 
respectively. Let Tp = inf {u: F(u) = 1} and let Tp, be defined similarly. In 
what follows we will assume that: 

Rl. Tp < td, or tf = Tp, = oo, or Tp = Tp and Tp is a continuity point of F . 
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R2. F and D do not have jumps in common. 

Under these conditions, a sufficient condition for the KM estimator to be 
consistent is the independence between the uncensored variables and the 
censoring times (see, e.g., Breslow and Crowley [1]). When f3 = f3 we have 
rj(/3 ) = U{ which are independent from the corresponding censoring times 
c.j — /3qXj because we have assumed that the errors are independent from the 
Cj's and the Xj's. However, for (3 ^ /3 it is not generally true that Vi{f3) is 
independent from q — /3'xj, i = 1, . .. , n. Hence, we can only guarantee the 
consistency of F*n to F@ when /3 = /3 . Therefore, the estimator defined in 

(2.5) may not be consistent (Lai and Ying [10, 11])- 

On the other hand, note that the estimator n defined as the solution to 

1 n 

(2.6) -Y^E F *Mu)\w i (P))xi = 0, 

1=1 

is Fisher consistent. In fact, F*^ — > Fp Q and therefore 
1 n 

-^^J^)|w i (/3 )]x j ^B ffo (^)x)=0, 

i=l 

where Hq is the joint distribution of (u,x')'. It is important to note that, 
unlike in the uncensored regression case, equations (2.5) and (2.6) are not 
equivalent: we cannot obtain (2.6) by differentiating (2.5) because F*g de- 
pends on f3. 

M-estimators defined by (2.6) were first proposed by Ritov [13] and further 
studied by Lai and Ying [11] when ip{u) is monotone (which corresponds to 
a convex p). However, it is well known that M-estimators with monotone 
ip functions are only robust against low leverage outliers. As mentioned 
in the Introduction, the main difficulty in using a redescending ip in (2.6) 
is that in general this equation may have several solutions with different 
robustness properties. Although in the uncensored regression model this 
difficulty can be avoided by defining the estimator as the solution to the 
minimization problem (2.1), the corresponding minimization in the censored 
case (2.5) does not in general yield a consistent estimator. In other words, 
(2.5) cannot be used to select a consistent solution of (2.6). For this reason, 
in the next subsection we will define robust M-estimators as the solution of a 
minimization problem using a bounded loss function p that has a consistent 
sequence of solutions. 

2.1. Consistent M-estimators. First note that to obtain scale equivariant 
regression estimators, we need to standardize the residuals in the estimating 
equations using a robust error scale estimator s n . 
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Let p:M. — > M. + satisfy regularity conditions P1-P4 above. For each (3 and 
7 in W define 



1 n 

(2.7) C(/3,7) = -E^ 



n . 

2=1 



u - j'xi 



Wi(/3) 



where s n is a robust scale estimator of the residuals. For each f3 £ MP let 
(2.8) 7n ( / 3) = argminC n (/3, 7 ). 



Note that 7 n (/3) can be considered an M-estimator of regression of the 
residuals r,i{f3) on the covariates Xj. Since F£,p ^ s a consistent estimator 
of Fp Q , the distribution of the u^s, and since the errors are independent of 
the Xj's, it is reasonable to expect that 7 n (/3o) — * 0- This can be formally 
proved with similar arguments to those used in the proof of Theorem 5 
below. Therefore, we define an estimator of /3 by the equation 

(2-9) %0n) = 0. 

To avoid existence problems, we can alternatively define (3 n as 

(2.10) (in = argmin[7 n (/3)'A n 7 n 0£l)], 



where A n = A n (xi, . . . ,x n ) is any robust equivariant estimator of the co- 
variance matrix of the explanatory variables Xj, 1 < % < n. The covariance 
matrix A n is needed to maintain the affine equivariance of the estimator. 

As an illustration of the difference between using (2.5) and (2.9) to define 
a robust estimator, in Figure 1 we plot ||7 n (/3)|| and the score equations (2.5) 
as a function of (3 for a data set of n = 200 observations with /3 = 1.5 and a 
probability of censoring of approximately 32%. These data were generated 
following the same model we used in our simulation study described in Sec- 
tion 7. Note that although the score equation has two distinct solutions and 
only one is close to the true value of /3 = 1.5, our proposed optimization 
problem has a unique minimum and this minimum is close to /3 . This defi- 
nition may be considered an extension of Ritov's M-estimators for censored 
data to the case of bounded p functions. In particular, note that f3 n satisfies 
equation (2.6) with iJj(u) = p'{u). It follows that this estimator will have the 
same asymptotic properties as the estimators considered in Ritov [13]. 



2.2. S-estimators. The scale estimator s n in (2.7) may be chosen to be 
the scale of the residuals of an initial (and scale-equivariant) estimator that 
does not require a scale estimator itself. One class of estimates that satisfies 
this is the class of S-estimators (Rousseeuw and Yohai [16]). We can extend 
this class of estimators to the case of censored observations following the 
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same principle as above, that is, for each (3 we fit an S-estimate to the resid- 
uals r*(f3), and find the (3 whose residuals have the "smallest" S-estimator 
(i.e., the one with the smallest norm). 

Let pi satisfy regularity conditions P1-P4 and let b = Ep[pi(u)] where F 
is the distribution of the errors Ui in (1.1). Define the M-scale S n (/3,-y) by 



1 



n r— ' 



i=i 



Pi 



tt-7Xj 
S n (J3,7) 



Wi(/3) 



2.11) 



and let 



(2.12) 



Note that 7 n (/3) is the S-estimator of regression of the residuals (r*(/3),x^)', 
i = 1, . .. , n. We define the S-regression estimator for censored responses as 
the vector f3 n such that 



In = ar g min S n {P,i)- 



(2.13) 



7»G8n) = 0. 



As before, to avoid existence problems, the following definition is also nat- 
ural: 

n = argmin[7 n (/3)'A 

n7n 03)], 



where A n = A n (xi, . . . ,x n ) is any robust equivariant estimator of the co- 
variance matrix of the covariates Xj. 

A robust residual scale estimate s n can be defined by 



(2.14) 





(a) Riglit-liand side of score equation (2.5) 



(!>) \\9n(P)\\ 



Fig. 1. Panel (a.) shows an example where the score equations (2.5) have two roots with 
only one of them close to /3 = 1.5 whereas panel (b) shows that, for the same data set, 
the objective function of (2.10) has a unique minimum close to (3 . 
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In particular, we can obtain a consistent version of the LMS using as p\ a 
jump function 



(2.15) 



0. 
1. 



if \u\ < 1, 
if it > 1> 



and b = 1/2 in equation (2.11) above. 

In Section 3, we will show that the choice b = sup u pi(u)/2 yields regres- 
sion estimators with high breakdown point. However, we know from the 
uncensored case that S-estimators cannot combine high breakdown point 
with high efficiency for normal errors (see Hossjer [5]). To overcome this 
problem, in the next subsection we will extend to the censored case a class 
of estimators that can achieve simultaneous high efficiency and high break- 
down point. 



2.3. MM- estimators. Yohai [22] proposed a class of estimators, called 
MM-estimators, that simultaneously have breakdown point 50% and high 
efficiency for normal errors. In this subsection, we extend this class of esti- 
mators to the case of censored responses. 

Consider two functions p\ and p2 that satisfy the regularity conditions 
P1-P4. Moreover, assume that p2(u) < pi(u) for all u and that sup u p2(u) = 
sup u pi (it). Let (3 n and s n be the S-regression and S-scale estimators cal- 
culated as in (2.13) and (2.14), respectively. For each 7 € W define R(j) 



as 



(2.16) 



R(rr) 



i n 

-£^*- 

i=i 



P2 



u - 7 X, 



and let 7 n be a local minimum of R(-) such that i?(7 r 
estimator n for censored regression is defined by 



< R(0). The MM- 



(2.17) 



The motivation for the definition in (2.17) is as follows. We improve the 
initial S-estimator (3 n by fitting an efficient M-estimator to the residuals of 
(3 n . The resulting M-estimate 7 n is the required correction. Expanding the 
conditional expectations in (2.16) we obtain 



1 n 

-E 

n f-f 

i=i 



(2.18) 



kp2 



+ 



7'xj 



0--Si) 



x / . Pi 



dF 



nf3, 



[U 
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For each i such that Si = let Mj = {j : rj((3 n ) > ri(0 n ),Sj = 1}. Then, we 
have 



(2.i9) r P2 (^^v w=ep 2 i 

and 1 - F*g (r»(/3J) = Ejgm, ^j, where tTj, j eM = {j : Sj = 1}, are the 
ies g 

For i, j = 1, . . . ,n let 



probabilities given to the uncensored rj(/3 n ) by the KM estimator F*^ . 



(2.20) 7T. 



7r,y ( n ^ 7Tfc J , if Si = and j G M$ 

1/n, if 5j = 1 and i = j, 

0, otherwise. 



Then, from (2.18) and (2.19) we have 

(2-21) fl( 7 ) = £X> ( ^^ ) Try. 

Since the 7Ty's do not depend on 7, a local minimum of -R('y) will satisfy 



i=ii=i * 

Similarly to the uncensored case, this equation can be written as 

n n 

(2.22) EE ffl ii( r i(W = °> 

j=ii=i 

where 

„,. _ P2(( r j(Pn) ~7'x t )An) „ 

(X/7 t ~ / i 1} n m 

3 ((r.^J-yx,)/^) J 

Hence, a local minimum of R(~f) is the weighted least squares estimator 
for the points (rj(/3 n ),Xj) with weights Wij, i,j = 1, . . . ,n. Equation (2.22) 
suggests that an iterative reweighted least squares algorithm can be used to 
find a local minimum of R(y)- Furthermore, since we need to find a local 
minimum such that R("y) < R(0), and reweighted least squares iterations 
reduce the objective function (see Remark 1 to Lemma 8.3 in Huber [6], 
page 186) we can start this algorithm at 7 = 0. 
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2.4. t- estimators. Another way to obtain estimators with high break- 
down and high efficiency for normal errors with censored responses, is to 
extend the class of r-estimators (Yohai and Zamar [23]). These estimators 
are based on an efficient scale estimator, called r-scale. 

Let pi:R->R+ and p 2 :R^R + satisfy conditions P1-P4, and let b = 
Ep{pi). Moreover, to obtain consistent estimators, we will assume that p\ 
and p2 satisfy: 

P5. pi, i = 1,2, are continuous, and if < v < w with p 2 {w) < sup u p 2 (u) 

then p 2 (v) < P2{w). 
P6. 2p 2 (u) - p' 2 (u)u>0. 

Given a sample u\ , . . . , u n let s n be the solution of 

1 n 

-Y]pi(ui/s n ) = b 

m *■ 

and define the r-scale as 



n 



i=l 



T ri=S 2 n -^p2{ui/i 

n f— ; 



i=i 



The extension of the r-estimators for censored data follows the same lines 
as the one for S-estimators but using a r-scale instead of an S-scale. 
More specifically, let 5 n (/3,7) be as in (2.11) and define T n ((3,-f) by 



1 



(2.23) r n (/3, 7 ) 2 = S n ((3, 7 ) 2 - £ Ep* 



i=i 



P2 



u - 7'x, 



7n(/3) = argminr n (/3,7) 



Wi(/3) 



Let 
(2.24) 

and define the r-estimator (3 n as in (2.9) or (2.10). 



2.5. Alternative representation. In this section, we show an alternative 
way of writing the estimating equations that define our estimators for cen- 
sored data. This alternative representation is most useful when computing 
these estimators for non-smooth functions p(u) (e.g., the least median of 
squares — LMS). We will also use this representation in our proofs in the 
Appendix. This approach also lets us understand better the connection be- 
tween the estimators defined in the previous sections and their uncensored 
counterparts. 

Let n, . . . , r n be a random sample from a distribution F, and let ci, . . . , c n 
be unobservable censoring variables independent from the r^'s. Suppose that 
we observe r\ = min(rj, q) and let 5i = I{ri < q} where I {A} is the indicator 
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function of the event A. The Kaplan-Meier estimator of F assigns positive 
weights only to noncensored observations. Furthermore, the self-consistency 
property of the Kaplan-Meier estimator (Efron [4]) implies that, if iTj is the 
probability assigned to r| for Sj = 1, then 

(2.25) ^' = ^+ £ *H> 

ft x * f « 

Tj >r\ ,0i=0 

where the 7Ty's are given by (2.20). Observe that 7Ty can be interpreted as 
the proportion of the mass from the censored ith observation that is assigned 
to the jth point. Note that the mass 1/n of each censored observation r* is 
distributed among all the uncensored r* > r* with 5j = 1 proportionally to 

TTj. 

Suppose now that r* = r*({3) for 1 < i < n are residuals for some vector 
of regression parameters (3, let Xj, 1 < i < n, be the corresponding vectors 
of covariates and call icpij the values given by (2.20). The censored residual 
sample can be written as zi = (r*(/3), S\, x^)', . . . ,z n = (r*(/3),<5 n ,x^)'. Con- 
sider the discrete distribution function H*p that assigns mass np^j to the 
point (r|(/3),Xj). Following the same arguments leading to (2.21) it is easy 
to show that for any function j:Rxl p -*lwe have 

(2.26) - £ E F * p [g(u, x,) \ Zl ] = £ £ ff (r* (/3) , x,)^,^ = E H ^ [g(u, x)] . 

i=l i=lj=l 

Then, C n ((3,~f) in (2.7) can be written as 



C n ((3, 1 ) = E H * nj} 



7 x 



This formula simplifies some computations. For example, consider the jump 
function p defined in (2.15) and the solution s n to Eh* [p(u/s n )] = 1/2. 
Noting that the marginal distribution of the first coordinate of H*p is F*p, 
we have that 

s n = median(|u|) = median(|n|), 

^ n{3 

and thus iterative algorithms are not required. 

The following theorem shows that H*^ is consistent to the true joint 

distribution function H(u,x.) = F(u)G(x) when /3 = (3 . Moreover, Theorem 

p P 
A.l in the Appendix, shows that if f3 n — >/3 , then H*p — >H(u,x). 

Theorem 1. Let (y*,Xi,5i), i = l,...,n, be observations from a cen- 
sored linear regression model as in Section 2, and assume that the errors 
and censoring variables satisfy Rl and R2 on page 7. Let H^p be defined as 
above. Then H*^ o (u,x) — > H(u,x) a.s. 
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3. Breakdown point. In general, for a sample Z n of size n, the finite- 
sample breakdown point (Donoho and Huber [3]) of an estimator T n — 
T n (Z n ) is defined as 

e* n (T n ,Z n )= min {fc/n:sup||T n (Z£ n ) -T n (Z n )|| =oo}, 

l<fe<n 

where the supremum is taken over all possible samples Z£ which are ob- 
tained by replacing k observations from Z n with arbitrary values and ||T|| 
is the L2 norm. 

Let Z n = (zi, . . . , z n ) be a sample from a censored linear regression model, 
where z, = (y*,Xj, <5j), Xj £ R p . Assume that the rank of {xi, . . . , x n } is p and 
let q = max||0|| =1 #{i : 8'xi = 0}. Let m be the number of censored observa- 
tions in the sample, m = J2i=i The following theorems show that a lower 
bound for the breakdown point of S-, MM- and r-regression estimators is 

(3.1) 7 = A; /n, 
where 

. . , / / b\ b 

(3.2) ko = mini nil \—q — m,n m 

b is the right-hand side of equation (2.11) and a = sup u p(u) 



Theorem 2 (Breakdown point of S-estimators). Let S be a scale esti- 
mating functional based on a function p satisfying P1-P4. Let (3 n be the 
S-estimator defined in Section 2.2, then 

(3.3) <09 nJ Z)> 7 . 

Theorem 3 (Breakdown point of MM-estimators). Let (3 n be the MM 
estimator defined in Section 2.3 with functions p\ and P2 satisfying P1-P4, 
Pi < Pi and a = sup P2 = sup p\ . Then e*(/3 n ,Z) > 7. 

The following theorem is proved in Salibian-Barrera and Yohai [17]. 

Theorem 4 (Breakdown point of r-estimators) . Let f3 n be the t -estimator 
defined in Section 2.4 with loss functions p\ and pi satisfying P1-P6. Then 

e ;03 n ,z)> 7 . 

Note that the lower bound in (3.1) is maximized when b/a = (1 — q/n)/2. 
The smallest possible value of q is p— 1, and in this case, the sample is said 
to be in general position (Rousseeuw and Leroy [15]). Using the optimal b/a 
we have 

1/n — b+1 — 2m \ 
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Note that when n — > oo the right-hand side converges to 1/2 — A, where A is 
the probability of censoring. This is in agreement with our discussion in the 
Introduction, where we mention that the breakdown point of the median 
may be as small as 1/2 — A when there are censored observations. Although 
in linear regression models with uncensored response variables it is possible 
to obtain robust regression estimators with asymptotic breakdown point of 
0.5, we believe that the loss in breakdown-point observed in the censored 
case is due to the use of the Kaplan-Meyer estimator that may convert 
censored observations into outliers. We conjecture that this loss cannot be 
to reduced, at least when the estimate is defined using the Kaplan-Meyer 
estimate. 

4. Asymptotic properties. The next theorem shows a property related 
to the consistency of the S-estimator defined in Section 2.2. 

Theorem 5. Let p satisfy regularity conditions P1-P4. Let the errors 
u and covariates x in the linear model (1.1) have joint distribution function 
Hq(u,x) = Fq(u)G(x) such that Fq(u) is symmetric and has a unimodal 
density, and G((3'x ^ 0) = t > b/a for all /3sl p . Assume that Rl and R2 on 
page 7 hold, and let f n (/3 ) = argmin 7 S n (/3 , / y), where S n {f3,^f) is defined 
in (2.11). Then 7 n (/3 ) 0. 

The same kind of arguments used in the proof of Theorem 5 can be used 
to prove similar results for MM-estimators as defined in Section 2.3. Note 
that a complete proof of consistency would require to show that if j3 ^ j3 
then ||7 n (/3)|| remains asymptotically away from zero. We have not been 
able to prove this. However, in all our numerical experiments this property 
seems to hold. 

We can nonetheless prove the local consistency and asymptotic normality 
of the M-estimates defined in Section 2.1. The proof is based on Theorem 5.1 
in Ritov [13] where the author studies M-estimates for censored regression 
which solve (2.6). Unfortunately, showing that there exists a sequence of 
consistent solutions of this equation seems to be very difficult. However, 
it can be shown that there exists a sequence (3 n of approximate solutions 
to this equation which is y'n-consistent and asymptotically normal. More 
precisely, under some regularity conditions Ritov [13] shows that there exists 
a sequence (3 n such that 

1 n 

(4-1) Y,EF* [^)K(/3J>q-^0 

Ti ' ■ -, n 
1 = 1 

and such that ^fn{ft n — /3 ) N (0, A^, 1 B^A^p) where 

(4.2) A^ = f E(™!\c-p'^>u)W^{u)W^ (u)P(c-l3' x>u)dF (u), 
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where c is the censoring variable, 



w^(«)=v(«) 



l-F (u) 



fo( u )/fo(u) and 



(4.3) 



J 



E(iaif\c- /3qX > u)W|(u)P(c - /3qX > it) dF (u). 



The following theorem shows a similar result for the estimates defined by 
(2.9). To simplify the proofs we will only consider the case where the error 
scale a is known. 

Theorem 6. Assume that: 

1. p satisfies PI, P2 and P3 and P4 and is three times continuously dif- 
ferentiable with bounded derivatives. Moreover, there exists Co such that 
p{°o) = max n p(u) and P(mm(y,c) — (3'x < cq) < 1 for all (3 in a neigh- 
borhood of (3 ; 

2. the errors Ui have a symmetric and a strictly unimodal density fo with 
finite information for location, that is, J^o(^^~y) 2 /o( n o) < °°! 

3. the vector of explanatory variables x has compact support; and 

4. the matrix A defined in (4-2) is nonsingular. 

Then, there exists a sequence f3 n such that (i) y / n7 n (/3 n ) — >0 and (ii) 
^[ri{J5 n — /3 ) N(0, A^ 1 B^A^ 1 ), where A^ and B^ are defined in (4.2) 
and (4.3), respectively. 

Consider a differentiable function p(u) satisfying P1-P4, and let p' = ip 
with = ao > 0. For c > let p c {u) = (c/ao)p(u/c) and i/j c (u) = p' c (u) = 
(l/ao)ip(u/c). Then the functions p c satisfy P1-P4 and lim^oo tp c (u) =u = 
i/j*(u). It is possible to show that A^ c — ► A^* and B^ c — ► B^*. Therefore, 
when c — > oo the relative asymptotic efficiency of the proposed M-estimate 
with respect to the Buckley and James estimate tends to 1. Choosing c large 
enough, this relative efficiency can be as close to 1 as desired. For example, 
this can be obtained using p(u) = pr(u) Tukey's bi-square function with 
derivative 



^ T (u) = u{l-u 2 ) 2 I{\u\ < 1) 



where I(\u\ < 1) = 1 if \u\ < 1 and otherwise. 
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5. Computing algorithm. Computing the estimators proposed in Sec- 
tion 2 requires solving a highly complex optimization problem. In this sec- 
tion, we present an efficient algorithm to compute the S-estimators defined 
in Section 2.2. 

We will follow a widely used strategy to approximate the solution of com- 
plex optimization problems in robust statistics. This approach is based on 
generating a large number N of candidate vectors /3 l5 . . . ,Pn- One way to 
generate these candidates is by drawing subsamples of size p from the data 
and adjusting them. The estimator is then approximated by the best can- 
didate j3 n . The number of candidates N required to obtain a good approxi- 
mation can be determined explicitly as in the uncensored case (Rousseeuw 
and Leroy [15]). In other words, if Pi, . . . ,/3 N are the resampling candidates 
described above, the approximated estimator (3 n satisfies f3 n = P k , where 

ln{Pk)' A nln{Pk) = .mm 7 n (/3 )'A n 7 n (/3 ■). 

1<J<N J J 

We now turn our attention to the calculation of 7 n (/3y) for each candidate 
(3j. Recall that this requires to solve the minimization problem given by 
(2.12). For each /3- consider a large number of candidates for 7 and set 
Jniftj) to be the best of these candidates. Note that for each fixed (3j if j3 r 
is good approximation to the true (3, then the vector j3 r — (3j is a natural 
candidates for •y n (Pj). This observation follows by noting that in this case 
the residuals 7*j(/3-) will follow a linear regression model with coefficients 
(3 — [3 j. Then, we approximate 7 n (/3,) by the vector (3 r — /3 ■ satisfying 

S n (f3j ,P r -/3j)= min S n (Pj , & - (3j ) , 

where for each pair (3,-y G W, 5 n (/3,7) is the M-scale estimator defined in 
(2.11). 

Note that, in principle, this algorithm requires finding iV 2 scales S n (Pj,Pi - 
Pj)i hj — 1) •• -i n - However, this is not always necessary. Suppose that we 
have already computed 7 n (/3,-) for j = 1, . . . , i and let 

«i = ,™,7 (^)'A n 7^.), 

i<j<t J J 

the best value of the objective function obtained so far. We will need to 
compute n/ n (Pi+i) only if 

7n(Pi+l)' A nln(Pi+l) < K f 

Divide the set of candidates for , y n (P i+1 ) into two sets: those with (P k — 
P i+ i)' A n (P k — P i+ i) > Ki (call them 7 1 ,...,7 A r 1 ) and those with (P k — 

P i+l )'A n (P k - P i+1 ) < ^ (call them 7i, ■ • ■ ,7%)- Note that II'Tt* O^i+l ) II < 
Ki only if 

min S n (P i+1 ,jj)> min S n {p i+1 ,^ •). 
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Fig. 2. Heart transplant data. u l"s indicate deaths, indicate censored observations. 

The least squares estimator seems to be influenced by the two young patients that die early 
in the study. 



Hence, we first compute w = mini<j<jv 2 Snifli+iiJj)- Then we compare each 
S n (/3i + i,lm) for m = 1, . . . ,i\Ti with to. If for some m , we find S n ((3 i+1 ,i mo ) < 
u then we stop and set Kj+i = K{. Since Ki — > we expect E(N\) to decrease 
as well. Our Monte Carlo experiments show that there is a substantial gain 
in speed with this modified algorithm. 

6. Example. Consider the Heart dataset analyzed in Kalbfleisch and 
Prentice [8]. These data contain information on heart transplant recipients, 
including their age and their survival times, which are censored in some 
cases. In Figure 2, we plot Log (Survival time) versus Age for these pa- 
tients. We indicate uncensored cases with the symbol "1" and censored ones 
with "0"s. In the same figure, we also show the fitted lines corresponding 
to our modified extensions of the LS and MM-estimators. Note that the LS 
estimator is very much influenced by the early death of two young patients, 
that can be considered outliers. We used small diamonds around these points 
to identify them on the plot. We also plot the same LS fit with these two 
points removed. Note that this line is now close to the robust fit. 

7. Monte Carlo study. To study the finite-sample properties of these es- 
timators we performed a Monte Carlo study for the simple regression model: 



Ui = a + (3xi + Ui, i = l,...,n. 
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Table 1 
MSEs without outliers 



Estimator 


S 


LMS 


LS 


MM 


GM 


LI 


MSE 


0.060 


0.164 


0.019 


0.027 


0.046 


0.025 



We considered 1,000 samples of size n = 100, independent normal errors 
m ~ A/"(0, 1), random covariates X{ ~ jV(0, 1) independent from the errors, 
a = and f3 = 1.5. We used censoring random variables c\ , . . . , c n that were 
sampled from an independent random variable with distribution AA(1,1). 
With these choices we have P{5 = 0) = 0.32. 

We included the consistent versions under censoring proposed in this pa- 
per of the following estimators: the least squares estimator (LS), the least 
median of squares (LMS), an S-estimator (S) with 50% breakdown point 
when there is no censoring in the sample, an MM-estimator (MM) with 95% 
efficiency under normal errors and no censoring, the Ll-estimator (LI) [an 
M-estimator with ip(x) =sign(x)], and the GM estimator defined by 

n 

^2 Ef'p bPi(u - a(P))\wpi]Tp2(xi - m x ) = 0, 
i=i 

where tpi(x) = ip2(x) = sign(x), a((3) = median(F*^) and m x = median(xi, . . . , 
x n ). This is the analogous to the Mood-Brown estimator with breakdown 
point 1/4. Both the S- and the MM-estimators used p functions in the 
bisquare family. 

The samples were contaminated with 10% of outliers (10 observations). 
These 10 observations were changed to the points (xo,mxo) where xq was 
set at 1 and 10 (resulting in low and high leverage outliers resp.), and m 
ranged between 2 and 5. 

In Table 1, we report the MSE for [3 when there are no outliers in the 
sample. Tables 2 and 3 contain the MSE's for (3 for the cases xq = 1 and 
xq = 10, respectively. From Table 1, we see that, as expected, the most 
efficient estimator is the LS, followed by the LI and the MM with efficiencies 
of 76% and 70%, respectively. For low leverage contaminations (Table 2), 
the two estimators that perform better, from a maximum MSE point of 
view, are the LI and the MM. These two estimators have a similar behavior 
with a small advantage of the MM. The other estimators are notably worse. 
Table 3 shows that for high-leverage outliers the MM estimator had the 
smallest MSE, followed by the S-estimator. Not surprisingly, both the LS 
and LI estimators have noticeably worse MSEs than all the other estimators 
considered here. 

Based on these results, we may conclude that the MM-estimators have 
the best overall performance. 
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APPENDIX: PROOFS 
A.l. Consistency of . 

PROOF of Theorem 1. Fix (a, v')' e W +1 and note that H*p o (a, v) = 
Eh*^ [I( u < a,x < v)] where 1(A) denotes the indicator function of the 
event A. Let u* = y* — /3qX, for i = 1, . . . , n. Using (2.26), we have 

H* n/3o (a,v)=E H:0o [I(u<a,x<v)] 
1 n 

= - V{5j/« < a, Xj < v) 
n ^ 

i=i 

+ (1 - Si)E F * 0o [I(ui < a,x, < v)|tii > <]}. 

Adding and substracting 

n 

^2(1 - 5i)E H [I{ui < a,Xi < v)\m > u*,Xi] 
i=i 

Table 2 







MSEs 


with 10% of outliers at xo 


= 1 














Slopes 








Estimator 


2 


2.5 


3 


3.5 


4 


4.5 


5 


S 


0.10 


0.27 


0.38 


0.30 


0.20 


0.13 


0.10 


LMS 


0.14 


0.30 


0.54 


0.69 


0.79 


0.76 


0.78 


LS 


0.03 


0.05 


0.10 


0.15 


0.23 


0.33 


0.43 


MM 


0.04 


0.11 


0.17 


0.18 


0.18 


0.19 


0.20 


GM 


0.09 


0.25 


0.40 


0.52 


0.62 


0.71 


0.78 


LI 


0.07 


0.16 


0.20 


0.21 


0.21 


0.21 


0.21 








Table 3 














MSEs with 10% of outliers at xo ■ 


= 10 














Slopes 








Estimator 


2 


2.5 


3 


3.5 


4 


4.5 


5 


S 


0.25 


0.50 


0.34 


0.20 


0.11 


0.08 


0.10 


LMS 


0.31 


0.45 


0.58 


0.65 


0.49 


0.40 


0.38 


LS 


0.24 


0.90 


1.98 


3.44 


5.09 


6.61 


7.61 


MM 


0.23 


0.45 


0.30 


0.17 


0.08 


0.06 


0.07 


GM 


0.15 


0.39 


0.56 


0.69 


0.79 


0.92 


1.08 


LI 


0.25 


0.93 


2.04 


3.59 


5.63 


8.08 


11.03 
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we obtain 

H nf3 ( a ,v) -H(a,v) 
1 n 

= -H[5«,Xi) -H(a,v)} 

(A.l) 

1 

+ "UK 1 ~ S ^( E K,0 O [9(Ui,*i)\ui > U*} 
i=l 

- E H {g(u i ,x i )\u i > u*,Xj))], 

where 

g(u*,Xi) = 5il(u* < a,Xj < v) + (1 - 5i)E H [I(ui < a,x.; < v)\m > u*,Xj], 

.ff denotes the joint distribution of the vector (x', u)' and g(u, x) = /(u < a, x < 
Note that 

g(u*,Xi) = E(I(ui < a,Xi < v)\u*,Xi,6i) 

and therefore E[g(u* ,Xj)] =H(a,v). 

Since g is bounded, Kolmogorov's law of large numbers yields 

1 n 

i=l 

Moreover, note that since g(u,x) = I(u < a)I(x < v) we have 

E K,p \9{ut^i)\ui > u*] = I(Xi < v)E F *^ [d{ui)\ui > ut], 

where d(u) = I(u < a). Also, because of the independence between u and x 
we have En{g{ui,Xi)\ui > u*,Xi) = I(x,j < v)E F (d(ui)\ui > u*). Hence, the 
second term in (A.l) equals 

1 n 

-£(1 - Si)I(xi < v){E F . j3o [d(u i )\u i > u*] - E F (d( Ul )\ Ui ><)). 

i=l 

Thus, we only need to show that 

(A.2) sup|£ F . [d(u)\u >b]- E F [d(u)\u > b]\ j^O. 



First, note that we only need to consider the supremum over the set b < a, 
since 

E Kp i d ( u )\ u >b ]= E F [d(u)\u > b] = for b > a. 
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Next, note that E F [d{u)\u > b] = (F(a) - F(b))/(l - F(b)). Thus, we need 
to bound 



sup 

b<a 



^W-<ft( ft ) F(a)-F(b) 



«i3 W-^))-K A (6)-^)) 



(A.3) 



: SUP 

b<a 



<4 



™P b K/3 (b)-F(b)\ 



( 1 - i? n*,/3 («))(l-^(«)) 



Since we are assuming Rl and R2 on page 7, Corollary 1.3 of Stute and 
Wang [21] implies 

lino^sup \F* :/3o (b) - F(b)\ =0 a.s. 



This completes the proof. □ 



Theorem A.l. Let (y*,x,j,5j), i = 1, . . . ,n, be observations from a cen- 
sored linear regression model as in Section 2, and assume that the errors 

and censoring variables satisfy Rl and R2 on page 7. Furthermore, assume 
P 

that f3 n — ► (3 and let H*a be defined as above. Then 

#n/3„(«,x)-^#(«,x). 



Proof. The proof follows the same steps as that of the previous theorem 
replacing H*g by H* - . The only difference is that now we need to show 

that 

sup IF* - (b)-F(b)\ 0. 

Lemmas 7.1 and 7.2 in Ritov [13] show that 

sup \F* A (b) - F(b)\ < O^n- 1 ' 2 ) + O(0 n - /3 ||) = o p (l), 

p 

because f3 n — > /3 . □ 
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A. 2. Breakdown point of the S-estimator. Define the M-scale estimator 
S(F) for any arbitrary distribution function F by 

(A.4) S(F) = inf{s > : E F [p{x/s)\ < b}, 

where b > and p : E — > M + satisfies P1-P4 in Section 2. The following lemma 
is needed to find the breakdown point of the S-estimators for censored ob- 
servations. Its proof can be found in Salibian-Barrera and Yohai [17]. 

Lemma A.l. Let S(F) be a scale estimator defined by (A.4) where p 
satisfies properties P1-P4. Then we have: 

(a) Given any K > 0, and C > b/a there exists K' such that if 
(A.5) P F {\x\ >K'} >C, 

then S(F) > K. 

(b) Given any M > and C <b/a, there exist M' such that if 
(A.6) P F {\x\ > M) <C, 

then S(F)<M'. 

Given a distribution function H and a Borel set B, in the rest of the 
paper we will denote by H(F>) the probability of B under H, that is H{B) = 
Ph(B). 

Proof of Theorem 2. Observe that S n (j3,j) can be defined by 
(A.7) E Kjp((r ~ 7'x)/S„(/3, 7 ))) = &, 

and S n ((3,0) by 

(A.8) E F: Jp(r/S n (f3,0))) = b. 

Assume that (3.3) is not true. Then there exists a sequence of samples 
Z&") = {z[ j \ ...,zti } ), 1 <j < oo, z\ j) = (y* {j) ,x?\5\ j) ) such that each 
differs from Z in t observations where i satisfies i < ko, and such that if we 
call (3^ =(3 n (Z^), then 

(A.9) lim ||/3^|| = oo. 

Let ~fj(P) denote the function 7(/3) defined in (2.8) when the sample is 
ZC?). We will show that (A.9) is not possible by proving that 

(A.10) lim|| 7j (/3W)||=oo 
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and that 
(A.11) 



sup ||7j(0)|| < oo. 

j 



Let us start by proving (A.ll). Assume that it is not true. Then without 
loss of generality we can assume that 



(A.12) 
and that 
(A.13) 



lim ||7 7 -(0) 

jr'^OO 



7,(0) 



lim 
i^oo ||7j-(0) 



oo 



A. 



We will show that this is not possible by proving that 
(A14) lim 5^(0,7,(0)) = 00 

,—►00 

and 
(A.15) 



sup (0,0) < 00, 

j 



where Sn ((3,-f) denotes the function S n (f3,"y) when the sample is Z^\ 
Let F*^ denote the distribution of r — 7'x when (r, x) has distribution 



H*p and the sample is Z^\ Let 
(A.16) M 



max \y*\ + 1. 
Ki<n 11 1 



Then the yf s in that are neither contaminated nor censored will have 
absolute value smaller than M. Moreover, F*q gives at least mass 1/n to 

each of these points. Therefore, F* q ) (\y\ < M) > (n — m — t)/n. Since t < ko, 
using (3.2) it follows that (n — m — t)/n> 1 — b/a. Thus, from Lemma A. 1(b) 

there exists M' such that si j) (0,0) < M' for all j, and (A.15) holds. 

We now turn our attention to (A. 14). Let £j = |A'x,;|, 1 < i < n, where A 
is defined in (A.13), and let 

(A.17) £ = mm{&:&>0}/2. 

Then, for all the elements of the original sample, except at most q, we have 
|A'x.j| > £. All the contaminated samples Z^ have at least n — q — m — t 

(7)1 

noncensored observations from the original sample Z such that |A x^ | > 
Then, for j large enough, at least n — q 



(A.18) 



y?' ) -7 i (0) / x i |> 



|7 7 -(0)|| 



m — t observations in Z^> satisfy 
7,(0) 



17,(0) 



x. 



M 
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Fix K > arbitrary and let K' be as in Lemma A. 1(a) with C any real 
number satisfying 

(A.19) ^>C>-, 

n a 

where ho is the smallest integer larger than nb/a. Since t < ko, by (3.2) we 
have (n — q — m — t)/n > b/a, and then 

(A.20) (n- q-m-t)/n> C. 

Because of (A. 12) and (A. 13), we can always find jo large enough so that the 
right-hand side of (A. 18) is larger than K' for all j > jo- Moreover, ^o-y-(o) 

gives at least mass 1/n to those residuals — 7 J -(0)'xj. Hence, by (A.20), 
for j > jo we have 

<o\(o)(M >K')>(n-q-m- t)/n > C. 
(i) 

From Lemma A. 1(a) it follows that Sn (0,7, (0)) > K for all j > jo and this 
proves (A. 14). 

We now prove (A. 10). Assume that it is not true. Then we would have 
(A.21) sup||7 j (^ ) )||=L<oo. 

3 

To show that this is not possible we will prove that 
(A.22) HmS n (f3W,y ((3tt))) = 00 

j—>oo 

and 

(A.23) 8apS n (p®,-0®)<oo. 

3 

To show (A.23) let M be as in (A. 16) and observe that there are at least 
n — m — t observations in with It/,- *| < M. It is easy to see that F*^ ~ 
gives mass at least 1/n to these observations, and the proof follows as that 
of (A. 15) above. 

We will now prove (A.22). Without loss of generality assume that 

(A - 24) lim = x 

Let ^ be as defined in (A. 17). Then for all the elements of the original 
sample, except at most q, we have |A'xj| > £. All the contaminated samples 
Z( J ) have at least n — q — m — t noncensored observations from the original 
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sample Z with |A | > £. Then, for j large enough, at least n — q — m — t 
observations in Z&) satisfy 

(A.25) \yf -a^'xil > 

where a® = j3$ + ~i j {(3 l £ ) ). From (A.9), (A.21) and (A.24) it is easy to 
see that limo^ooQ^Vll/^l^ll = A- Observing that F*^}-, ,~ gives at least 

mass 1/n to these n — m — q — t residuals of the form — ct^'xi, and that 
the right-hand side of (A.25) can be made arbitrarily large, the rest of the 
proof follows the same lines as that of (A. 14). □ 

A. 3. Breakdown point of MM-estimators. The following theorem is needed 
to find the breakdown point of MM-estimators when the response variable 
can be censored. 

Theorem A. 2. Let Z = (zi, . . . ,z n ) with Zj = (y*,Xj,<5j) and^i£R p be 
a sample from a censored linear regression model. Let f3\ n be any regression 
estimator, and let F* = F% in n the KM estimator of the corresponding resid- 
ual distribution. Let p\ and p 2 two functions satisfying P1-P4, and such that 
P2 < Pi and a = sup/52 = sup pi- Define s n = S(F*), where S is a M-scale 
functional based on p\ and < b < a. Let fan be another estimator satisfying 

(A.26) E H *(p2((u+0m - 02n)'x)/«„)) < E H *(p 2 (u/s n )). 

Assume that the rank of {xi, . . . ,x n } is p, let q = maxii0i| = i ^{i : 0'xj = 0} 
and m = Ya=i $i ■ Then 

(A.27) e* n 2n , Z) > min(e;(3i n , Z), (1 - b/a) - (q+ m)/n, b/a - m/n). 

Proof. Let eo be the right-hand side of (A.27) and assume that the the- 
orem is not true. Then there exists a sequence of samples = (z[ j \. . 
1 < j < oo, zf 1 = (y* , , 8± ) such that each Z^ differs from Z in t < eon 
observations and such that linij^oo ||/3 2 J 2|| = oo. Since t < e*(/3 ln ,Z)n we 
have sup j \\pQ\\ < oo. Hence, if we call 7^ = ln (Z®) - /3 2 n( z(j) ) then 
(A.28) lim ||7^|| = 00. 

Moreover, in all the samples Z«', 1 < j < n, there are at least n — t — 
m > (1 — b/a)n noncensored observations from the original sample. Since 

supj ||/3^|| < 00 we have that the residuals r*(P^) for these n — t — m 

observations remain bounded uniformly in j. Let Fn^ be F* when the 



a 



M 
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sample is Z^. Then it is clear that assigns probability at least 1/n to 

these residuals, and hence by Lemma A. 1(b) we have sup 7 - < 
00. Without loss of generality assume that 

(A.29) i im JZ!^ = A. 

II (i I 

Let M = maxi<j< n \y*\ + 1, 5i = |A'xj|, 1 < i < n, and 5 = min{5,; > 0}/2. 
Note that all the contaminated samples ZO' have at least n — q — m — t 
non censored observations z-^ = (yf\yi"p ,5^) from the original sample Z 
which have |A x- | > 5. Then, since for j large enough 



(A.30) | y ^_ 7 W' Xi |> 



% 

/ J3) - ' 

0) _ «,&•)/. 



i-yO") I 

I In I 



7n' 



|7« I 



x, 



M 



by (A.29) and (A. 28), there are at least n — q — m — t observations in Z^' 

such that \yf^ — 7„ 'xj| — ► oo. Since riQ = n — q — m — t> nbj a we can choose 
bn/riQ < p < a and let M = p^ 1 ^). There exists a jo sufficiently large such 
that for j > jo these no observations satisfy 

\y^-^)' Xi \/S + >M. 

Noting that the distribution function H* assigns at least mass 1/n to each 
of these no observations, we can conclude that 

(A.31) Eh. ( P2 ((u + (3S - 3S)'x)/, n )) > % 2 (M) > % > ^- = 6. 

n n n no 

On the other hand, by the definition of s n we have 
(A.32) E H *(p 2 (u/s n ))<E H *( Pl (u/s n )) = b. 

Finally, note that (A.31) and (A.32) contradict (A.26). □ 

Proof of Theorem 3. Follows immediately from Theorem A. 2 □ 

A. 4. Consistency of the S-regression estimator. Some auxiliary results 
are needed to prove our main result in this section (Theorem 5). The fol- 
lowing lemma is proved as Lemma 7 in Salibian-Barrera and Yohai [17]. 



Lemma A. 2. Let p satisfy regularity conditions P1-P4. Let H n (u,x) — > 
Fq(u)Gq(x) = Hq a.s. where Fq is symmetric and has a unimodal density, 
and G(/3'x ^ 0) > t for all f3 S MP. Then for any s > and any b* < ta there 
exists K such that 

lim inf E H (p{(u-B'x)/s))>b* a.s. 
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The next lemma is proved as Lemma 9 in Salibian-Barrera and Yohai [17]. 

Lemma A. 3. Let p satisfy regularity conditions P1-P4. Let H n (u,x) — > 
Fq(u)Go(x) = Hq a.s. where Fq is symmetric and has a unimodal density, 
and G{(3'x ^ 0) > b/a for all (3 £ MP. Let so be defined by Ep {p{u) / sq) = b. 
Then given e > and K there exist s\ > sq and b\ > b such that 
lim n _ >00 inf £ <|| /3 ||<^£;H n (p(('u - /3'x)/si) > b x . 

The next lemma is proved as Lemma 10 in Salibian-Barrera and Yohai 
[17]. 

Lemma A. 4. Let p satisfy regularity conditions P1-P4. Let H n (u,x) — > 
Ho(u,x) =i ? o(n)Go(x) a.s. where Fo is symmetric and has a unimodal den- 
sity, and G(/3'x / 0) = t > b/a for all (3 € M p . Let sq defined by Ep (p(u/ sq)) = 
b, then if s\ > sq we have lim n ^ 00 £ , # ri ( / o('u/si)) < b. 

Proof of Theorem 5. Observe that S n (/3 ,7) is the value s satisfy- 
ing En* (p((y ~ 7' x )/ s )) = b. We know by Theorem 1 that H* (u,x) — > 
Hq(u,x) = Fo(u)Go(x) a.s. for all u and x. Define so by En (p(u/ so)) = b. 
Then using Lemma A. 2 with s = sq + 1, we can find K such that 

liminf inf 5 n (/3 n , 7) > so + 1 a.s. 

n ~*°° \\i\\>k 

Let e > be arbitrary. For this e and the K found above, by Lemma A. 3, 
we can find s\ > sq such that 

liminf inf S n (f3 v ., , ~f) > s\ a.s. 

n-»oo £ <||-y|[<X 

Take S2 such that so < S2 < min(so + 1, si). By Lemma A. 4 we have that 
lim n S n ((3o, 0) < S2 a.s. This implies that, with probability 1, there exists no 
such that for all n > no we have ||7 n (/9o)ll < £ - This proves the theorem. □ 

A. 5. Asymptotic distribution. Some auxiliary results are needed to prove 
Theorem 6. The following lemma is proved as Lemma 12 in Salibian-Barrera 
and Yohai [17]. 

Lemma A. 5. Let H n {u) with u E MP be a sequence of stochastic pro- 
cesses such that, for each n and each element of the underlying proba- 
bility space where the processes are defined, H n (u) is a distribution func- 
tion. Assume that H n (u)-^H(u) for each u £ M p , where H{u) is a dis- 
tribution function on MP. Let g:M p — > M be bounded and continuous, then 

E H Mn)]^E H [g(u)]. 
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The following lemma is proved as Lemma 14 in Salibian-Barrera and 
Yohai [17]. 



Lemma A. 6. Let p satisfy regularity conditions P1-P4. Let H n (u,x) — ► 
i ? o(n)Go(x) = Hq where Fq is symmetric and has a unimodal density, and 
G(/?'x 7^ 0) > t for all (3 G W . Assume that t > E Fo [p(u/a)]/a where a = 
sup u p(u). For all e > there exists 5 > such that 



lim P inf C n (/3 n , a) < E Fo (p 

n->oo \|| a ||>e 

where C n (f3, a) = Ejj* (p((u — a'x)/<r)). 



+ S) =0, 



Proof of Theorem 6. Let 



c n (A«) = -E^ 



j=i 



u — a x,- 



a 



Wi09) 



raer 



1 



EEp* 



1=1 







Wi(/3) 



Fh* a P 

71/3 1 



ti — Q: x 



(7 



i=l 



By Theorem 5.1 in Ritov [13], there exists a sequence j3 n such that 
(A.33) n x l 2 D n (fi n )-^Q 



and n 1 / 2 (/3 n — /3 ) — ► AA(0, A^B^A^ 1 ). Then we only have to prove that 

Using a second-order Taylor expansion around a = Owe obtain 
(A.34) C n (/3 n , a) = C n (/3 n , 0) + D' n ((3 n )a + ±a'L n (/3ja + ||a|| 3 K ra («), 
where there exists £q and Kq such that 



(A.35) 



p lim sup -KT n (a)| < Kq. 



IMI<£0 



Using Theorem A.l, we have that H*p (u,x) — ► Fo(u)Gq{x) in probability 
for any u and x, and therefore, by Lemma A. 6, we have that for any e > 0, 
there exists 5 > such that 



(A.36) lim P inf C n (3 n , a) < E Fo [ p 

n~*co \|| a ||>e 



+ 5 =0. 
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On the other hand by Lemma A. 5 



(A.37) 

(A.38) 
and 
(A.39) 
where 

(A.40) 



C n (J3 n ,0)-^E Fo [p[-))=d, 



a 



D n ((3 n )^E Ho ^(u/a)) X = 



L — —~Ep 



a*- 



a 



^Go(xx'). 



p 

The next step is to prove that 7 n (/3 n ) — >0. We have 

{\hn(Pn)\\ >e}c( inf C n ((3 n ,cx) < d + 25/3} U {C n (/3 n ,0) >d + 8/3} 

and therefore (A.36) and (A.37) imply P{\\l n (/3 n )\\ > e}) 0. 

Finally, we will prove that ^ 1 ^ 2 ||7 n (/3„)|| = o p (l). Then if we denote J n = 
{n 1 / 2 ||7„( / 9 n )|| > e}, we have to prove that for any e > we have 

(A.41) limP(J n ) = 0. 

n^oo 

According to (A. 34) we have 

Jncl inf [D' n (PJa + ±a'L n ((3Ja + \\a\\ 3 K n (a)]<oX 

Uo>||Q||>en~ 1/2 J 

U{||7n(/3JII>£o}. 

Since -P{||7 n (/3 n )|| > £0} 0, in order to prove that (A.41) it is enough to 
show that 



P 



(A.42) 



inf 

£(j>||a||>en -1 / 2 
-> 1 



dD n {P n ) , 1 a' 



a 



2 a a 



>0 



and since (A.38), (A.39) and (A.40) hold, it is enough to prove that for all 

e 

p Iim sup — - — — — = 0. 



This follows from 



and (A.33). □ 



sup 

llol^en- 1 / 2 \\ a \\ 



||a||>en-i/2 II « II 

\a?D n (J3 n )\ <n y\ 
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